Apache Pig's Optimizer
نویسندگان
چکیده
Apache Pig allows users to describe dataflows to be executed in Apache Hadoop. The distributed nature of Hadoop, as well as its execution paradigms, provide many execution opportunities as well as impose constraints on the system. Given these opportunities and constraints Pig must make decisions about how to optimize the execution of user scripts. This paper covers some of those optimization choices, focussing one ones that are specific to the Hadoop ecosystem and Pig’s common use cases. It also discusses optimizations that the Pig community has considered adding in the future.
منابع مشابه
A Study of Execution Strategies for openCypher on Apache Flink
The concept of big data has become popular in recent years due to the growing demand of handling datasets of large sizes. A lot of new frameworks have been proposed to deal with the problem of processing, analysis and storage of big data. As one of them, Apache Flink is an open source platform allowing for distributed stream and batch data processing. Cypher, a declarative query language develo...
متن کاملApache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite’s architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of pro...
متن کاملCuttlefish: A Lightweight Primitive for Adaptive Query Processing
Modern data processing applications execute increasingly sophisticated analysis that requires operations beyond traditional relational algebra. As a result, operators in query plans grow in diversity and complexity. Designing query optimizer rules and cost models to choose physical operators for all of these novel logical operators is impractical. To address this challenge, we develop Cuttlefis...
متن کاملBioassay of histamine.
The usually accepted mothods for bioassay of histamine according to Schild el at. (1951), employ (i) eat's blood pressure, (ii) guinea pig's uterus or (iii) the guinea pig's ileum. Code (1952) has reported that the most convenient tissue for the bioassay of histamine is guinea pig's terminal portion of the ileum and on this account it is the most widely employed pharmacological preparation for ...
متن کاملSupporting Similarity Queries in Apache AsterixDB
Many applications require similarity query processing. Most existing work took an algorithmic approach, developing indexing structures, algorithms, and/or various optimizations. In this work, we choose to take a different, systems-oriented approach. We describe the support for similarity queries in Apache AsterixDB, a parallel, open-source Big Data management system for NoSQL data. We describe ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Data Eng. Bull.
دوره 36 شماره
صفحات -
تاریخ انتشار 2013